C2.1 Tandem repeats

Tandem repeats were searched across the genome using the software Tandem Repeats Finder

(TRF)37. Tandem repeats from de novo prediction were also included. 

C2.2 Transposable elements (TEs)
TEs were identified in the genome using an approach combining both homology-based and de novo predictions.


Homology-based prediction: Homology-based approach involves searching commonly used databases for known TEs. We used RepeatProteinMask and RepeatMasker with repbase which contains a vast amount of known TEs. TEs in the genome assembly were identified at both DNA and protein levels. RepeatMasker was employed for DNA-level identification using a general library (repeatmasker libraries-20110419). At the protein level, RepeatProteinMask, the updated software as included in the RepeatMasker package, was employed for a WuBlastX search against the TE protein database. Distribution of divergence rate of detected TEs is presented in Fig. S10.

De novo prediction: Two de novo prediction programs, LTR_FINDER38 and RepeatScout39, were used in constructing the de novo repeat library inferred based on the assembled genome. These programs predict repeats in different fashions: 1) LTR_FINDER searches the whole genome for a characteristic structure of the full-length long terminal repeat retrotransposons (LTRs). Their ~18bp terminal sequence was complementary to the 3' tail of some tRNA, and we used the Strongylocentrotus tRNA library as a reference; 2) RepeatScout builds consensus sequences based on lmer using fit-preferred alignment score. Contamination and multi-copy genes in the library were filtered first. RepeatScout library was used for RepeatMasker. The program was performed again to find homologs in the genome and to categorize the found repeats.



Tandem Repeats 
TRF

http://tandem.bu.edu/trf/trf.unix.help.html

./trf /Volumes/Bay3/Software/trf/in/oyster.v9_M.fa 2 7 7 80 10 50 500 -f -d -m -h

oyster.v9_M.fa.2.7.7.80.10.50.500.dat




./trf /Volumes/Bay3/Software/trf/in/oyster.v9_M.fa 2 7 7 80 10 50 500 


http://eagle.fish.washington.edu/cnidarian/trf_012113/oyster.v9_M.fa.2.7.7.80.10.50.500.summary.html


Repeat Masker
./RepeatMasker -dir /Volumes/Bay3/Software/RepeatMasker/out/ -gff /Volumes/Bay3/Software/trf/in/oyster.v9_M.fa 


oyster.v9_M.fa.out.gff

------

Repeat Protein Mask
./RepeatProteinMask -noLowSimple /Volumes/Bay3/Software/trf/in/oyster.v9_M.fa 




./RepeatProteinMask /Volumes/Bay3/Software/trf_data/in/oyster.v9_M.fa 
Masking Simple and Low Complexity Repeats…
   - TRF : 9219
   - RepeatMasker: 0

Masking Repeat Proteins…
   - Protein Hits = 7772
oyster.v9_M.fa.annot

http://eagle.fish.washington.edu/cnidarian/TJGR_RepeatProteinMask_1_oyster.v9_M.fa.masked




Carry out with oyster.v9_90


./RepeatProteinMask /Volumes/Bay3/Software/trf_data/in/oyster.v9_90.fa 








--


./RepeatMasker -dir /Volumes/Bay3/Software/RepeatMasker/out/ -gff oysterv9_90.fa 




Carry out with oyster.v9

./RepeatProteinMask /Volumes/Bay3/Software/trf_data/in/oyster.v9.fa  


Masking Simple and Low Complexity Repeats...
   - TRF         : 66143
   - RepeatMasker: 0


Masking Repeat Proteins...
   - Protein Hits = 58468
Done!

http://eagle.fish.washington.edu/cnidarian/TJGR_oyster.v9.fa.annot


Tab delim version
http://eagle.fish.washington.edu/cnidarian/qDOD_RepeatProteinMask_v9.txt






---

./RepeatMasker -dir /Volumes/Bay3/Software/RepeatMasker/out/042313 -gff /Volumes/Bay3/Software/RepeatMasker/oyster.v9.fa

 sr320$ ./RepeatMasker -dir /Volumes/Bay3/Software/RepeatMasker/out/042313 -gff /Volumes/Bay3/Software/RepeatMasker/oyster.v9.fa
RepeatMasker version open-3.3.0
Search Engine: NCBI/RMBLAST
Master RepeatMasker Database: /Volumes/Bay3/Software/RepeatMasker/Libraries/RepeatMaskerLib.embl ( Complete Database: 20120418 )



analyzing file /Volumes/Bay3/Software/RepeatMasker/oyster.v9.fa

Some previous RepeatMasker output files were moved to the directory
/Volumes/Bay3/Software/RepeatMasker/out/042313/oyster.v9.fa.preTueApr231626372013.RMoutput
in order not to overwrite them.


Checking for E. coli insertion elements
identifying simple repeats in batch 1 of 10832
identifying full-length ALUs in batch 1 of 10832
identifying full-length interspersed repeats in batch 1 of 10832
identifying remaining ALUs in batch 1 of 10832
identifying most interspersed repeats in batch 1 of 10832
identifying long interspersed repeats in batch 1 of 10832
identifying ancient repeats in batch 1 of 10832
identifying retrovirus-like sequences in batch 1 of 10832
identifying tough LINE1s in batch 1 of 10832
identifying more simple repeats in batch 1 of 10832
identifying low complexity regions in batch 1 of 10832

…..

http://eagle.fish.washington.edu/cnidarian/TJGR_oyster.v9.fa.out.gff